FUNDAMENTAL ANALYSIS

Market Microstructure and Trading Systems

Espinosa Garcia Lyha, lyha.espinosa@iteso.mx
Manica Pineda Bryan Azahel, if722176@iteso.mx
Ruiz Magaña Juan Pablo, if721093@iteso.mx
Vazquez Vargas Ana Cristina, if71215@iteso.mx


July.2022 | Repository: Link


Trading System From a Fundamental Perspective

Unemployment ate impact on USDMXN historical series



Abstract

The purpose of this project is to analyze the release of an economic indicator as an event that can generate temporary patterns in the price reactions of a financial asset. These price reactions help to detect patterns and use them to build a trading system. For this specific case study, we would be using the unemployment rate as an economic indicator for the USDMXN currency.


1. Introduction


How can an economic indicator influence the behaviour of a selected currency? This question will be answer in the following report, going from a contextualization and exploration of the obtained data, to creating a optimized trading system that will return an overall profit using the unemployment rate and the USDMXN price history to make decisions over the transactions made with the previously mentioned currency.

In the first section of this Market Microstructure and Trading Systems project consist of a historical analysis of both of the mentioned time series that lead to a conceptual definition of the indicator and an empirical strategy of capital management.

The second section of the project is divided in statistical and computational aspects. For the statistical analysis different tests are performed to give a wide comprehension of the unemployment rate. On the other hand, on computational terms, the subsection shows the juxtaposition between the USDMXN time series and the data corresponding to the unemployment rate, creating scenarios to co-relate the information. Furthermore, based on the already analyzed data, certain parameters will define based on specific metrics calculated for the currency behaviour everytime the indicator occurs, in other words, for a specific time lapse.

The following section contains the definition of the trading system. This proposed system will have as parameters Volume of trade, Take profit and Stop Loss. On a brief note, the trading system will make a transaction everytime the indicator is reported and wont closed the transaction (perfome the opposite operation) until the price touches a barrier define by the Take Profit and Stop Loss.

The final section will mainly contain the trading system optimization performed by a PSO Algorithm and its corresponding constrains over a rentability metric like the Sharpe Ratio. Once this has been optimized the perfomance will be presented with Performance Atribuition Metrics like Sharpe, Sortino and Treynor ratio.

Overall the purpose of this report is to include all the skills learned on this course and apply them to real life data. Along the notebook evidence of this learning process is presented.


2. Install/Load Packages and Depedencies


2.1 Python Packages

In order to run this notebook, it is necessary to have installed and/or have the requirements.txt file with the following:

  • pandas>=1.1.1
  • numpy>=1.19.1
  • datetime >=1.0.0
  • jupyter>=1.0.0
  • statsmodels>=0.13.1
  • scipy>=1.8.0
  • plotly>=5.6.0
  • chart-studio>=1.1.0
  • pyswarm>=0.6

2.2 Files Dependencies

The following are the file dependencies that are needed to run this notebook:

  • MP_M1_2018.csv : USDMXN Future contracts price series for 2018.
  • MP_M1_2019.csv : USDMXN Future contracts price series for 2019.
  • MP_M1_2020.csv : USDMXN Future contracts price series for 2020.
  • Unemployment_Rate.xlsx : Unemployment rate series from 2018 to 2020.

2.3 Install Packages

In [1]:
%%capture
# Install all the pip packages in the requirements.txt
import sys
!{sys.executable} -m pip install -r requirements.txt
In [1]:
### Import libraries to use
import pandas as pd
import numpy as np
from statsmodels.graphics.gofplots import qqplot
from pyswarm import pso
import pyswarms as ps
import warnings

### Import scripts
import functions as fn
import visualizations as vis
import data as dt

usdmxn_18 = pd.read_csv('files\MP_M1_2018.csv')
usdmxn_19 = pd.read_csv('files\MP_M1_2019.csv')
usdmxn_20 = pd.read_csv('files\MP_M1_2020.csv')
unemployment = pd.read_excel('files/Unemployment_Rate.xlsx')

3. Data Description


Currency data

The currency data contains the information related to the MXNUSD result of a concatenation of 3 csv files. Each file contains one year data (2018,2019,2020) of prices for continuous future contracts. The source of this data is unknown since it was given as class material.

Indicator data

The Unemployment rate data comes in a excel file. In this case, the information corresponds to the same 3 year period obtained for the currency data. The source of this material is Factset.


4. Historical Analysis


4.1 Financial Aspects


The USDMXN behavior is analyze during the report date of the unemployment rate of the american economy.

4.1.1 Unemployment Rate

The chosen indicator to analyze its impact on the behaviour of the previously stated currency is the unemployment rate. The unemployment rate can be interpreted as the porcentage of the unemployed people corresponding to the total labor force. In other words, this rate represents the proportion of the labor force that is not currently working and are actively searching for a job.

The next formula demonstrates how the indicator is obtain. It is the result of the number of people without a job divided by the people whether employed or unemployed contained in the working age population category. For the US this refers to people aged between 15 and 64. Finally to get the indicator on a percentage unit it only takes multiplying the result of the division times 100.

$$Unemployment \ Rate = \frac{Number\ of \ people \ unemployed}{Total \ labor \ force} * \left( 100 \right )$$

This indicator is reported monthly by the U.S. Bureau of Labor Statistics as a result of applying Labor Force Surveys. On this project the U-3 unemployment rate will be used as a reference, however the Bureau of Labor Statistis presents various results for the unmeployment rate taking different factors into consideration like the types of jobs the employed people have or how long they unemployed have been jobless.

4.1.2 Validations

Since the data has been gathered for the interest variables, now will be shown visual validations for whenever the indicator is announced. The purpose of doing this is to create empirical strategies for different scenarios to influence the trading system that will be defined in following paragraphs.

This validations consist of 3 functions. The first function will return a dataframe to validate, the second one is a visualization of the behaviour of the closing prices. The last function creates a summary of the strategy on therms of profit and loss.

The third function, also called empiric_trade, receives as an argument the dataframe to validate. Once the function gets the data, it takes the price of our currency at the same timestamp the indicator was reported. It also defines a subset with aproximately thirty minutes after the previous mentioned timestamp. Once this subset has been created, is possible to defined the required variables from the known information.

Primarly for the direction, all it takes is to substract the first price from the last price. If the answer is positive this translates on the closing price being greater than the opening price (when the indicator was announced), if it is negative, the opposite can be infered. On this reasoning path, since the trading decisions occured every time the indicator occurs, the direction can lead to a decision making reasoning. If the price has a rising tendency, a buy transaction can be expected to be fulfill. On the other hand, with a decreasing tendency, a sell transacion will be performed.

The next parameter to be set is the volume of transaction, this will be simply defined by the median value of the volume contained in the data frame, making this a changing parameter for each dataframe with a statistical justification.

Both Take Profit and Stop Loss are defined on pip units, the function defines ranges between 0 and the maximum or minimum price variation represented in pips, being define by the transaction. For a buy operation, it wil show losses it the price continuos to go down because we payed a higher price for the traded asset. This means that the Take Profit barrier wil be a random number between 0 and the maximun price variation. Similarly, the Stop Loss barrier will be defined in a range between 0 and the minimun price variation.

Finally to get the general profit and loss it only takes to substract the transaction price from the corresponding barrier price, whether it is a Take profit or a Stop Loss.

For a sell scenario, a loss will be perceived if the price continuous to go up consequence of selling the asset for a cheaper price. The take profit and the stop loss ranges will be defined contrary to the way they are defined on the buy scenario. Adding to this modification, now that the utility reflects when the price goes down, to get the take profit barrier price, the take profit in pip units will be substrated to the price at the time of the indicatior release meanwhile the pips for the stop loss will be added to the previously stated price.

In this case, to get the general profit and loss monetary quantity, will be enough with as substraction of the barrier prices to the price of the asset at the indicator report date.

To exemplify the explain above, in the next chunks there will be five validations that will ilustrate this idea on a numerical level.

In [2]:
### Let's see docstring for data_manipulation function
help(fn.data_manipulation)
Help on function data_manipulation in module functions:

data_manipulation(forex_1, forex_2, forex_3, indicator)
    This function creates a data frames for further use. It only accepts 3 year worth of data.
    It requires information of a designated currency and a chosen indicator.
    Parameters
    ----------
    forex_1 : CSV File
           The first year of currency data
    
    forex_2 : CSV File
           The second year of currency data
    
    forex_3 : CSV File
           The third year of currency data
    
    indicator : CSV File
           All three years worth of indicator data

In [3]:
help(fn.Event_Data)
Help on function Event_Data in module functions:

Event_Data(usdmxn, unemployment)
    This function return a dataframe for each indicator event. This includes 30 minutes prior and 30 after the indicator has
    been annouced.
    Parameters
    ----------
    usdmxn : Dataframe
        All three years worth of forex data

In [4]:
data_validation = fn.data_manipulation(usdmxn_18,usdmxn_19,usdmxn_20,unemployment)
events = fn.Event_Data(data_validation[0],unemployment)

First Validation¶

The first validation will be done using the dataframe shown below:

In [5]:
help(fn.validation)
Help on function validation in module functions:

validation(data, n_val)
    This function returns a dataframe for a visual and empirical validation.
    
    Parameters
    ----------
    data : Dataframe
        Dataframe to validate
        
    n_val: numeric
        Defines the number of validation
        
    Return
    ----------
    Trial : Dataframe

In [6]:
val_1 = fn.validation(events,1)
In [7]:
help(vis.val_graph)
Help on function val_graph in module visualizations:

val_graph(df)
    This function plots the closing price time series. It also includes a line where the indicator was announced 
    
    Parameters
    ----------
    df : Dataframe
        Dataframe to validate
        
    Return
    ----------
    fig : Plot

For a visual interpretation of the price behavior, the val_graph function displays a plot of the price series and adds a line when the indicator was announced. In this case, the prices tend to continue to go down after the indicator was reported. Considering that the transaction should be done on the date the indicator report, the advise would be to sell

In [8]:
vis.val_graph(val_1)

The empirical design of the strategy states that the operation to perform is to "sell" given that the direction is negative. For the Takeprofit pip variation will be on terms of the minimum varition meanwhile the Stoploss pip variation will be defined on terms of the maximun variation. On the profit and loss columns, a monetary result is shown. When the price touches the takeprofit barrier the expected utility would be \$32265.66. On the other hand the loss if the price reaches the Stop Loss barrier would be -\$23092.09.

In [9]:
fn.empiric_trade(val_1)
Out[9]:
Operation Direction Volume Takeprofit(pip) Stoploss(pip) Profit($) Loss($)
Operation Sell -1.0 632660.0 510 365 32265.66 -23092.09

Second Validation¶

The second validation will be done using the datadrame shown below:

In [10]:
val_2 = fn.validation(events,2)

In this second validation, even though the price has a tendency to decrease, at the end of the analyzed time period the price starts increasing. Visually it can be stated that the direction of the transaction will be positive given that the last price is higher than the price cut by the indicator line.

In [11]:
vis.val_graph(val_2)

The empirical design of the strategy states that the operation to perform is to "buy" given that the direction is positive. For the Takeprofit pip variation will be on terms of the maximum varition meanwhile the Stoploss pip variation will be defined on terms of the minimum variation. On the profit and loss columns, a monetary result is shown. When the price touches the takeprofit barrier the expected utility would be \$2588.75. On the other hand the loss if the price reaches the Stop Loss barrier would be -\$2992.5.

In [12]:
fn.empiric_trade(val_2)
Out[12]:
Operation Direction Volume Takeprofit(pip) Stoploss(pip) Profit($) Loss($)
Operation Buy 1.0 237500.0 109 126 2588.75 -2992.5

Third Validation¶

The third validation will be done using the datadrame shown below:

In [13]:
val_3 = fn.validation(events,3)

For this specific scenario, the price tendency is on the upside. Implying that the trade is perform on the red dash axis, the logical results of the empirical trade, will be that the recommended to buy once the indicator has been reported because the direction seem to be positive.

In [14]:
vis.val_graph(val_3)

The empirical design of the strategy states that the operation to perform is to "buy" given that the direction is positive. For the Takeprofit pip variation will be on terms of the maximum varition meanwhile the Stoploss pip variation will be defined on terms of the minimum variation. On the profit and loss columns, a monetary result is shown. When the price touches the takeprofit barrier the expected utility would be \$7397.93. On the other hand the loss if the price reaches the Stop Loss barrier would be -\$5294.59.

In [15]:
fn.empiric_trade(val_3)
Out[15]:
Operation Direction Volume Takeprofit(pip) Stoploss(pip) Profit($) Loss($)
Operation Buy 1.0 145057.5 510 365 7397.9325 -5294.59875

Fourth Validation¶

The fourth validation will be done using the datadrame shown below:

In [16]:
val_4 = fn.validation(events,4)

Since the behaviour of the time series of the visualization is decreasing, once the indicator is reported, the advised operation would be to sell, based upon the fact that selling in the future will translate in losses contrary to closing the operation on the future.

In [17]:
vis.val_graph(val_4)

The empirical design of the strategy states that the operation to perform is to "sell" given that the direction is negative. For the Takeprofit pip variation will be on terms of the minimum varition meanwhile the Stoploss pip variation will be defined on terms of the maximun variation. On the profit and loss columns, a monetary result is shown. When the price touches the takeprofit barrier the expected utility would be \$1324.62. On the other hand the loss if the price reaches the Stop Loss barrier would be -\$24.30.

In [18]:
fn.empiric_trade(val_4)
Out[18]:
Operation Direction Volume Takeprofit(pip) Stoploss(pip) Profit($) Loss($)
Operation Sell -1.0 121525.0 109 2 1324.6225 -24.305

Fifth Validation¶

The fifth validation will be done using the datadrame shown below:

In [19]:
val_5 = fn.validation(events,5)

Even though there seem to be two operation at the same time the unemployment rate is announced with different prices on a decreasing trend, after this sprecific moment in time, there is a visible positive trend in the time series overall.

In [20]:
vis.val_graph(val_5)

The empirical design of the strategy states that the operation to perform is to "buy" given that the direction is positive. For the Takeprofit pip variation will be on terms of the maximum varition meanwhile the Stoploss pip variation will be defined on terms of the minimum variation. On the profit and loss columns, a monetary result is shown. When the price touches the takeprofit barrier the expected utility would be \$40503.30. On the other hand the loss if the price reaches the Stop Loss barrier would be -\$28987.66.

In [21]:
fn.empiric_trade(val_5)
Out[21]:
Operation Direction Volume Takeprofit(pip) Stoploss(pip) Profit($) Loss($)
Operation Buy 1.0 794182.5 510 365 40503.3075 -28987.66125

The scenarios presented on the validations, occur one after the other. In other words the validation for the second scenario in time term comes after the validation for the first scenario an so on. Given this explanation, the increment of the positions wont chance, it will be static.

4.2 Statistical Aspects


Time series

In [78]:
### Let's see the time series evolution
vis.plot_ts(ts=unemployment,title='Unemployment rate',yaxes="Rate",xaxes='Time')

The Unemploymnet rate have a downside trend from 2018 to march of 2020, during the pandemic we can see an impact and a change of trend which it's upside specifically during april and may, despite the dominant downside trend that we can observe from jun to December there are still higher rates than the ones reported durign 2018 and 2019.

4.2.1 Autocorrelation and Autocorrelation Partial Component

To test if our time series have a high degree of autocorrelation with the lagged version of the time series. Autocorrelation it's important to know if a time series past values have a relation with the future values. The test is Durbin Watson.

Statistic:

$$\frac{\sum_{t=2}^T((e_t - e_{t-1})^2)}{\sum_{t=1}^Te_t^2}$$

Where:

$e$: Represent the residuals of the time series.

In [22]:
help(fn.dw)
Help on function dw in module functions:

dw(x, funcs=[<function acf at 0x000001B09E5C0CA0>, <function pacf at 0x000001B09E5C0E50>])
    Docstring:
    
    The porpouse of this function is to compute the durbin watson test for the time series residuals,
    in order to know if there is autocorrelation present.
    Also the PACF and the ACF plot is compute.
    
    Parameters
    -------------
    x: time series
    funcs: an array with the acf, pacf function.
    
    Returns
    -------------
    returns three figures, which contain the result of the durbin watson test and the PACF and ACF plot.
    
    References
    -------------
    https://www.statology.org/durbin-watson-test-python/

In [23]:
fn.dw(x=dt.unemployment['Actual '])

The Durbin Watson Statistic it's between 0 and less than two, therefore we can said there is positive autocorrelation in the unemployment rate time series, meaning that past values have effect in the outcome of future values.

4.2.2 Heteroscedasticity Test

Levene Test¶

Statistical test used to check if variances from two samples are equal (Homocedasticity).

Statistic:

$$ W = \frac{(N-k)\sum_{i=1}^{k}N_{i}(Z_i - Z..)^2}{(k-1)\sum_{i=1}^{N_i}(Z_{ij}-Z_i)^2} $$

Where:

  • $W$: The test statistic.
  • $k$: Number of diferent groups
  • $N$: Number of total cases in each group.
  • $N_i$: Number of cases in group i.
  • $Y_{ij}$: Value of the observed variable for i,j cases.
  • $Z_{ij}$: $\bar{Y_i}$ the mean of group i

Null Hypothesis: Variances are equal (Homocedasticity).

Bartlett Test¶

Statistical test used to check if variances from two samples are equal (Homocedasticity).

Statistic:

$$ X^{2} = \frac{ln(S^{2}{p})-\sum{i=1}^{k}(n_i-1)ln(S^{2}{i})}{1+\frac{1}{3(k-1)}\Big(\sum{i=1}^{k}\frac{1}{n_i-1}-\frac{1}{N-k}\Big)} $$

Where:

  • $X^{2}$: The test statistic.
  • $k$: Number of diferent groups
  • $N$: Number of total cases in each group.
  • $N_i$: Number of cases in group i.
  • $S^{2}_{i}$: Sample's Variances.

Null Hypothesis: Variances are equal (Homocedasticity).

In [24]:
help(fn.var_test)
Help on function var_test in module functions:

var_test(x, alfa: float)
    Docstring
    
    The porpouse of this function is to test if the time series have the heteroscedasticity property.
    
    Parameters
    -------------------
    x: time series
    alfa: 1- significance level
    
    Returns
    --------------------
    A chart with the results of Levene and Bartlett statistical tests.

In [25]:
fn.var_test(x=dt.unemployment['Actual '],alfa=0.05)

After testing our time series with the Levene and Bartlett test both Null Hypothesis were rejected with an $\alpha$ of 0.05 (Homocedasticity property). Therefore the heterocedasticity property it's present in our time series meaning that variance is not constant for point in time.

4.2.3 Normalized Test

List of the following tests:

  • Shapiro Wilk
  • Jarque-Bera
  • D'Angostino
  • Anderson Darling test

Shapiro Wilk¶

It is a statistical test used to verified if a set of data follows a normal distribution. Published by Samuel Shapiro and Martin Bradbury in the 1960s.

Statistic:

$$ W = \frac{\Big(\sum_{i=1}^{n}\alpha_{i}x_{(i)}\Big)}{\sum_{i=1}^{n} (x_{i} - \bar{x})^2 } $$

Where:

  • $W$: Is the test statistic Shapiro-Wilk.
  • $\alpha$: The Shapiro-Wilk's Coeficient.
  • $n$: Number of data.
  • $X_{i}$: The variable to be test at observation i.
  • $\bar{X}$: The average of the variable.

Null Hypothesis $H_0$: The sample follows a normal distribution.

D'Angostino¶

Statistical test for normality. Based on kurtosis and *skewness**.

The sample skewness and kurtosis are defined as:

$$ g_1 = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^3}{\Big(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^{2}\Big)\frac{3}{2}} $$$$ g_2 = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^4}{\Big(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^{2}\Big)^{2}} $$

After a transformation to $g_1$ and $g_2$ it's applied the statistic results:

$$ K^2 = Z_{1}(g_{1})^2 + Z_{2}(g_{2})^2 $$

Where:

  • $g_1$: The skewness sample.
  • $g_2$: The kurtosis sample.
  • $n$: Number of data.
  • $X_{i}$: The variable to be test at observation i.
  • $\bar{X}$: The average of the variable.
  • $Z_{1}(g_{1})$: Transformation for ${g_1}$.
  • $Z_{2}(g_{2})$: Transformation for ${g_2}$.

Null Hypothesis $H_0$: The sample follows a normal distribution.

Jarque Bera¶

Goodness of fit test, which use the kurtosis and the skewness from the sample, to test if the data follows a normal distribution.

Skewness & Kurtosis:

$$ S = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^3}{\Big(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^{2}\Big)\frac{3}{2}} $$$$ K = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^4}{\Big(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^{2}\Big)^{2}} $$

Jarque Bera Statistic:

$${JB = \frac{n}{6}\Big(S^{2} + \frac{1}{4}(K-3)^{2}\Big)}$$

Where:

  • ${n}$: Is the number of observations.
  • ${S}$: The sample Skewness.
  • ${K}$: The sample Kurtosis.
  • $X_{i}$: The variable to be test at observation i.
  • $\bar{X}$: The average of the variable.

Null Hypothesis $H_0$: The sample follows a normal distribution.

In [26]:
help(fn.normality_test)
Help on function normality_test in module functions:

normality_test(x, alfa: float, funcs: list = [<function shapiro at 0x000001B09D050670>, <function normaltest at 0x000001B09CFFCB80>, <function jarque_bera at 0x000001B09CFFCF70>, <function anderson at 0x000001B09D050E50>])
    Docstring
    
    The porpouse of this function is to compute normalility test for a time series with the following statistical tests:
    shapiro, d'angostino, jarque bera and anderson darling.
    
    Parameters
    ------------
    x: time series
    alfa: 1-significance level
    funcs: a predifined list with normal test functions.
    
    
    Return 
    ------------
    
    A chart with the normalility test results.
    
    References
    --------------
    https://plotly.com/python/v3/normality-test/

In [27]:
fn.normality_test(x=dt.unemployment['Actual '],alfa=0.05)
In [28]:
help(vis.hist)
Help on function hist in module visualizations:

hist(x, title: str, yaxes: str, xaxes: str)
    Docstring
    Function that plots the time series Histogram.
    
    Parameters
    --------------------
    x: time series.
    title: the title of the plot.
    
    Returns
    --------------------
    boxplot of the time series.

After computing different statistical test we can conclude the our data don't have a normal distribution. Notice that the P-Value for the Anderson Darling test is bigger than the choosed alfa, although $H_0$ is rejected. The reason behind this is that if the statistic is less or equal than an $\alpha$ of 0.05 then $H_0$ it's rejected.

Also we can confirm that our time series lack of normality by looking at the Histogram and the QQ-plot.

In [29]:
vis.hist(x=dt.unemployment['Actual '],title='Unemployment Rate Histogram',yaxes="Count",xaxes="Rates")
In [30]:
help(fn.qq)
Help on function qq in module functions:

qq(x, qqplot_data)
    Docstring
    
    The porpouse of this function is to plot a qqplot.
    
    Parameters
    --------------
    x: time series
    qqplot_data: figure
    
    Returns
    --------------
    A qqplot figure.
    
    References
    --------------
    https://plotly.com/python/v3/normality-test/

In [31]:
qqplot_data=qqplot(dt.unemployment['Actual '], line='s').gca().lines
In [32]:
fn.qq(x=dt.unemployment['Actual '],qqplot_data=qqplot_data)

4.2.4 Seasonality

To check if our time series indicator have a seasonality component we would use the Kruskal Wallis Test, which is a non parametric test whit the porpouse of proving if n samples orignate from the same distribution.

Statistic: $${Q=\frac{SS_{t}}{SS_{e}}}$$

Where:

  • ${SS_{t} = (N-1)\sum_{i=1}^{g}n_{i}(\bar{r}_{j}-\bar{r})^{2}}$
  • ${SS_{e} = (N-1)\sum_{i=1}^{g}n_{i}(\bar{r}_{ij}-\bar{r})^{2}}$
  • $n_j$ number of observations in group j
  • $\bar{r}_j$ the mean of the abosulute ranks of the data i in group j.
  • $\bar{r} = \frac{1}{2}(N+1)$: The average rank.

Null Hypothesis: all samples (in time series context months, quarters...) have the same mean. If rejectedthere is no seasonality.

In [33]:
help(fn.seasonality)
Help on function seasonality in module functions:

seasonality(x, alfa: float, m: int)
    Docstring
    
    The porpouse of this function is to check if the time series have a seasonal component,
    wirh the kruskal wallis test.
    
    Parameters
    ----------------
    x: time series
    alfa: 1-significance level
    m: periods of the timeseries if monthly 12, quarter, etc....
    
    
    Returns
    ----------------
    A chart with the results.
    
     Reference
    ----------------
    https://knk00.medium.com/how-to-determine-seasonality-without-plots-f18cee913b95

In [34]:
fn.seasonality(x=dt.unemployment['Actual '],alfa=0.05,m=12)

The result of the Kruskal Wallis test reject the null Hypothesis with an $\alpha$ of 0.05 , therefore at least one sample have a different median from the rest, in a time series context this can be interpreteted as at least one sample stochastically dominates at least one other sample. So the seasonal component is present in the time series.

4.2.5 Stationarity

It is define as a stationary process when data have the following property: mean, variance and autocorrelation structure do not change over time.

Augmented Dickey-Fuller test Is the statistical test used to check if our time series indicator have this property. It is a unit root test, the intuiton behind the test is that it determine how strong is the trend component in a time series.

Statistic:

$${DF_{\tau}=\frac{\hat{\gamma}}{SE(\hat{\gamma})}}$$

Where:

  • $\gamma$: Unit root.
  • SE: Standard error.

The null hypothesis of the test is that the time series can be represented by a unit root, that it is not stationary. The alternate hypothesis (rejecting the null hypothesis) is that the time series is stationary.

In [35]:
help(fn.stationarity)
Help on function stationarity in module functions:

stationarity(x, alfa: float)
    Docstring
    
    The porpouse of this function is to test if the the time series is stationary usign the Augmented Dickey Fuller test.
    
    Parameters
    -----------------
    x: time series
    alfa: 1-significance level
    
    Returns
    ----------------
    A chart with the results of the test.
    
    References
    ----------------
    https://machinelearningmastery.com/time-series-data-stationary-python/

In [36]:
fn.stationarity(x=dt.unemployment['Actual '],alfa=.05)

After computing the Aumented Dickey Fuller test pvalue results bigger than an $\alpha$ of 0.05 , therefore the Null Hypothesis can't be rejected and we can conclude that our time series is not stationarity.

This statement make sense with the output of the normal and heterocedasticity test because if variance was constant and the data distribute normal (normal data is stationary), then the time series would be stationary.

4.2.6 Atypical Detection

To check if our time series have outliers the IQR (Interquartile Rnage) criterion is used. The IQR is the difference between the third and the first quartil. To use this criterion data is divided in quarters, the IQR represent the middle half of the data that lies between the upper and lower quartiles.

$$ IQR = Q_3 - Q_1 $$

An aupper limit is define, observations above this limit are consider outliers.
Upper limit = $Q_3 +1.5*iqr$

A lower limit is define, observations below this limit are consider outliers.
Lower limit = $Q_1 -1.5*iqr$

In [37]:
help(fn.iqr)
Help on function iqr in module functions:

iqr(x)
    Docstring
    
    The porpouse of this function is to check if a time series have outliers useing the 
    IQR criterion.
    
    Parameters
    ------------------
    x: time series
    
    Returns
    ------------------
    a dataframe with the values that are consider outliers.
    
     References
    ------------------
    https://www.statology.org/interquartile-range-python/

In [38]:
warnings.filterwarnings("ignore")
fn.iqr(x=dt.unemployment)
Out[38]:
28    0.147
29    0.133
30    0.111
31    0.102
32    0.084
33    0.079
34    0.069
35    0.067
Name: Actual , dtype: float64
In [39]:
help(vis.boxplot)
Help on function boxplot in module visualizations:

boxplot(x, title: str, yaxes: str, xaxes: str)
    Docstring
    Function that plots the time series boxplot.
    
    Parameters
    --------------------
    x: time series.
    title: the title of the plot.
    
    Returns
    --------------------
    boxplot of the time series.

In [40]:
vis.boxplot(x=dt.unemployment['Actual '],title='Unemploymnet Rate Boxplot',yaxes='Sample',xaxes="Rates")

The ouput of our function showed that we have eight atipical values in the time series indicator, which are out of the upper limit define in our criterion, they can be visualize in the boxplot. All of them occur during 2020 which characterized as a period with high unemployment rate. Maybe due to the pandemic crisis.

4.3 Computational Aspects


4.3.1 Scenario Classification

Each event is classified following the following rules that represents occurrence scenarios:

Scenario Rule
A Actual $\geq$ Consensus $\geq$ Previous
B Actual $\geq$ Consensus $<$ Previous
C Actual $<$ Consensus $\leq$ Previous
D Actual $<$ Consensus $<$ Previous
  • A – Actual is greater than or equal to expectation, which is greater than previous. Expectation is that the index increases and increases
  • B – Actual is greater than or equal to the expectation, however the expectation is less than the previous one. Expectation expects it to go down, but it goes up
  • C – Actual is lower than the expectation, however, the expectation expected that it would increase more than the previous one. The expectation was upward; however, the reality is downward.
  • D – Actual is less than the expectation which in turn is less than the previous. The expectation is that it decreases and the reality decreases more.

To make this classification, a function is used, which consists of a for cycle that iterates the data frame that contains each scenario, that represents each monthly report of the indicator. Within this cycle, the percentage that was reported as current, consensus and prior is compared by multiple conditions, following the aforementioned rules and assigning them the corresponding scenario.

In [41]:
help(fn.Scenario_Clasification)
Help on function Scenario_Clasification in module functions:

Scenario_Clasification(indicator)
    This function clasifies the indicator data according to pre-set scenerios
    Parameters
    ----------
    indicator : Dataframe
           All three years worth of indicator data

In [42]:
data = fn.data_manipulation(usdmxn_18,usdmxn_19,usdmxn_20,unemployment)
usdmxn = data[0]
unemployment = data[1]
In [43]:
unemployment_df = fn.Scenario_Clasification(unemployment)
display(unemployment_df.head())
display(unemployment_df.tail())
Country/Region Event Importance Period Actual Consensus Prior Scenario
Datetime
2018-01-05 07:30:00 United States Unemployment Rate High DEC 4.1% 4.1% 4.1% A
2018-02-02 07:30:00 United States Unemployment Rate High JAN 4.1% 4.1% 4.1% A
2018-03-09 07:30:00 United States Unemployment Rate High FEB 4.1% 4.0% 4.1% B
2018-04-06 07:30:00 United States Unemployment Rate High MAR 4.1% 4.0% 4.1% B
2018-05-04 07:30:00 United States Unemployment Rate High APR 3.9% 4.0% 4.1% D
Country/Region Event Importance Period Actual Consensus Prior Scenario
Datetime
2020-08-07 07:30:00 United States Unemployment Rate High JUL 10.2% 10.5% 11.1% D
2020-09-04 07:30:00 United States Unemployment Rate High AUG 8.4% 9.8% 10.2% C
2020-10-02 07:30:00 United States Unemployment Rate High SEP 7.9% 8.2% 8.4% D
2020-11-06 07:30:00 United States Unemployment Rate High OCT 6.9% 7.7% 7.9% D
2020-12-04 07:30:00 United States Unemployment Rate High NOV 6.7% 6.8% 6.9% D

4.3.2 Events

Through a function, a data frame is created for each event, which consists of the prices with one-minute intervals of 30 minutes before each moment in which the indicator's statement was made and 30 minutes later. Resulting 36 data frames.

The function creates the data frames through a cycle that connects two data frames, the one containing the events (each release) and a data frame of the currency usdmxn with prices per minute. The date of the release is searched for in the price data frame and the 30 minutes before and 30 minutes after are extracted using “datetime.timedelta”.

In [44]:
warnings.filterwarnings('ignore') 
eventos_df = fn.Event_Data(data[0], unemployment)
eventos_df[0].head()
Out[44]:
index timestamp open high low close volume date
0 3297 2018-01-05 07:00:00 23.041475 23.046785 23.041475 23.046785 520680.0 2018-01-05
1 3298 2018-01-05 07:01:00 23.046785 23.046785 23.046785 23.046785 238645.0 2018-01-05
2 3299 2018-01-05 07:04:00 23.041475 23.041475 23.041475 23.041475 217000.0 2018-01-05
3 3300 2018-01-05 07:06:00 23.046785 23.046785 23.046785 23.046785 21695.0 2018-01-05
4 3301 2018-01-05 07:08:00 23.046785 23.046785 23.046785 23.046785 151865.0 2018-01-05

4.3.3 Metrics

The Metrics function defined in the functions.py file does all the calculations for the corresponding metrics:

  • Direction
  • Bullish pip
  • Bear pip
  • Volatility

It displays a dataframe for the metrics corresponding to everytime the unemployment rate is published. All the previously mentionate are reported on pips (1dll*10000).

**Direction**

$$(\text{Close}(t_{30}) - \text{Open}(t_0))$$

This specific metrics shows tendency in a certain time series. Applied to the prices of future contracts, this metric is calculated by substaction of the closing price 30 minutes after the indicator has been announced and the opening price when the indicator is published. The sign is the only relevant part of this result. If it is positive (+1), indicates that the closing price was higher than the opening. In case of the opposite, the sign will have to be negative (-1).

**Bullish Pip**

$$ \text{High}(t_0:t_{30}) - \text{Open}(t_0) $$

To obstain this metric is necesary to obtain the maximum price of the high prices for the following thirty minutes once the indicator has been announced. Once this has been calculated, a substraction is performed between the maximum price and the opening price on t=0. This translates on a general idea of the highest pip variation presented on the analyzed period of time.

**Bear Pip**

$$ \text{Open}(t_0) - \text{Low}(t_0:t_{30}) $$

Opposite to the Bullish Pip, this metric can be calculated by, then again, a substration of the opening price once the indicator has been announced and the minimum of the low prices on a time window of thirty minutes after the publication of the indicator. It ilustrates the lowest pip variation presented during the analyzed time period.

**Volatility**

$$ \text{High}(t_{-30}:t_{30}) - \text{Low}(t_{-30}:t_{30}) $$

This metric represents on a statistical way the prices dispersion. To calculate this metric, it is required to obtain the maximum value of the high prices and substract the minimimum value of the low prices.

In [45]:
help(fn.Metrics)
Help on function Metrics in module functions:

Metrics(eventos_df, unemployment_df)
    This function returns a consolidated dataframe for metrics like direction, bullish pip, bear pip and volatility,
    Parameters
    ----------
    eventos_df : Dataframe
        Dataframe that contains trading informartion related to the chosen currency for each indicator event.
    
    unemployment : Dataframe
        Dataframe that contains indicator informartion.

In [46]:
df_escenarios= fn.Metrics(eventos_df, unemployment_df)
display(df_escenarios.head())
display(df_escenarios.tail())
Scenario Direction Bullish_Pip Bear_Pip Volatility
Datetime
2018-01-05 07:30:00 A 1 802.085679 0.000000 1227.597677
2018-02-02 07:30:00 A 1 519.736522 0.000000 613.967477
2018-03-09 07:30:00 B 1 142.426418 236.964174 379.390592
2018-04-06 07:30:00 B 1 134.916728 179.622243 359.549382
2018-05-04 07:30:00 D -1 0.000000 396.470461 694.752120
Scenario Direction Bullish_Pip Bear_Pip Volatility
Datetime
2020-08-07 07:30:00 D 1 159.383039 318.105955 690.343025
2020-09-04 07:30:00 C 1 146.095666 48.655592 534.740381
2020-10-02 07:30:00 D -1 0.000000 448.722442 949.425028
2020-11-06 07:30:00 D 1 712.439193 177.360020 1287.769388
2020-12-04 07:30:00 D 1 280.056146 39.944059 479.808748

4.3.4 Decisions

Using the training data, the function "fn.decisions" creates a dataframe with the strategy to place orders acording to the scenario.

In [47]:
help(fn.decisions)
Help on function decisions in module functions:

decisions(df_escenarios, usdmxn)
    Function that creates a dataframe with the desingned
    strategy to place orders according to the scenario.
    
    Parameters
    ----------
    
    df_escenarios:dataframe
    
        'Datetime': timestamp, date of the indicator
        'Scenario': A, B, C or D
        'Direction': -1 if close price < open, 1 if close price > open
        'Bullish_Pip': diference between the highest price (t_0:t_30) and the open price t_0
        'Bear_Pip': diference between the open price t_0 and the lowest price (t_0:t_30)
        'Volatility': diference between the highest price and the lowest
    
     usdmxn:dataframe of the prices of the currency
    
    Returns
    -------
        df_de: dataframe
            dataframe with the following information
            'Scenario': A, B, C or D
            'Operation': Sell or Buy
            'SL': stop loss
            'TP': take profit
            'Volume': optimal volume

In [48]:
df_decisions = fn.decisions(df_escenarios, usdmxn)
df_decisions
Out[48]:
Scenario Operation SL TP Volume
0 A Sell 353.0 228.0 729960.0
1 B Sell 188.0 256.0 47170.0
2 C Buy 228.0 919.0 1894200.0
3 D Buy 407.0 188.0 1637755.0

Back Test¶

In order to run and see the performance associated to the previous trading strategy we'll be running a Back Test using the above configuration into our trading decisions, with this we expect to know how far we are from a good performance of our algorithm.

To work with this and others configurations we set a training and test set, for both of them we hace different periods but the show escencially the same data structure form intraday prices reporting for USDMXN. Also we need to set that configuration to our indicator reporting dates, because as we previously said this is the main factor to make a trading decision.

  • Training period: 01/01/2018 - 01/01/2019
  • Test period: 02/01/2019 - 02/01/2020
In [49]:
### Let's see our training set
display(dt.training_usdmxn.head())
display(dt.training_usdmxn.tail())
timestamp open high low close volume date
0 2018-01-01 18:00:00 23.590469 23.596036 23.590469 23.596036 1292590.0 2018-01-01
1 2018-01-01 18:01:00 23.590469 23.590469 23.590469 23.590469 466290.0 2018-01-01
2 2018-01-01 18:05:00 23.590469 23.590469 23.590469 23.590469 423900.0 2018-01-01
3 2018-01-01 18:07:00 23.584906 23.590469 23.584906 23.590469 254340.0 2018-01-01
4 2018-01-01 18:08:00 23.590469 23.596036 23.590469 23.596036 423800.0 2018-01-01
timestamp open high low close volume date
248982 2019-01-01 23:19:00 21.978022 21.978022 21.978022 21.978022 45500.0 2019-01-01
248983 2019-01-01 23:20:00 21.978022 21.978022 21.978022 21.978022 45500.0 2019-01-01
248984 2019-01-01 23:24:00 21.973193 21.973193 21.973193 21.973193 22755.0 2019-01-01
248985 2019-01-01 23:38:00 21.978022 21.978022 21.978022 21.978022 45500.0 2019-01-01
248986 2019-01-01 23:46:00 21.978022 21.978022 21.978022 21.978022 68250.0 2019-01-01
In [50]:
### Now let's display our test data set
display(dt.test_usdmxn.head())
display(dt.test_usdmxn.tail())
timestamp open high low close volume date
248987 2019-01-02 00:07:00 21.978022 21.978022 21.978022 21.978022 45500.0 2019-01-02
248988 2019-01-02 00:08:00 21.978022 21.978022 21.978022 21.978022 182000.0 2019-01-02
248989 2019-01-02 00:14:00 21.982853 21.982853 21.982853 21.982853 68235.0 2019-01-02
248990 2019-01-02 00:16:00 21.982853 21.997360 21.982853 21.997360 2568490.0 2019-01-02
248991 2019-01-02 00:17:00 21.992523 21.992523 21.992523 21.992523 68205.0 2019-01-02
timestamp open high low close volume date
472923 2020-01-02 23:38:00 19.884669 19.884669 19.884669 19.884669 125725.0 2020-01-02
472924 2020-01-02 23:41:00 19.884669 19.884669 19.884669 19.884669 477755.0 2020-01-02
472925 2020-01-02 23:49:00 19.888624 19.888624 19.888624 19.888624 25140.0 2020-01-02
472926 2020-01-02 23:51:00 19.888624 19.888624 19.888624 19.888624 75420.0 2020-01-02
472927 2020-01-02 23:59:00 19.892580 19.892580 19.892580 19.892580 1080805.0 2020-01-02
In [51]:
### Let's see scenario clasification for training period
clasification = fn.Scenario_Clasification(dt.unemployment)

clasification_train = clasification[(clasification.index>=pd.to_datetime('2018-01-01')) &
                                    (clasification.index<=pd.to_datetime('2019-01-01'))]
display(clasification_train.head())
display(clasification_train.tail())
Country/Region Event Importance Period Actual Consensus Prior Scenario
Datetime
2018-01-05 07:30:00 United States Unemployment Rate High DEC 0.041 0.041 0.041 A
2018-02-02 07:30:00 United States Unemployment Rate High JAN 0.041 0.041 0.041 A
2018-03-09 07:30:00 United States Unemployment Rate High FEB 0.041 0.040 0.041 B
2018-04-06 07:30:00 United States Unemployment Rate High MAR 0.041 0.040 0.041 B
2018-05-04 07:30:00 United States Unemployment Rate High APR 0.039 0.040 0.041 D
Country/Region Event Importance Period Actual Consensus Prior Scenario
Datetime
2018-08-03 07:30:00 United States Unemployment Rate High JUL 0.039 0.039 0.040 B
2018-09-07 07:30:00 United States Unemployment Rate High AUG 0.039 0.038 0.039 B
2018-10-05 07:30:00 United States Unemployment Rate High SEP 0.037 0.038 0.039 D
2018-11-02 06:30:00 United States Unemployment Rate High OCT 0.037 0.037 0.037 A
2018-12-07 07:30:00 United States Unemployment Rate High NOV 0.037 0.037 0.038 B
In [52]:
### Now let's see the clasification for test period
clasification_test = clasification[(clasification.index>=pd.to_datetime('2019-01-02')) &
                                   (clasification.index<=pd.to_datetime('2020-01-02'))]
display(clasification_test.head())
display(clasification_test.tail())
Country/Region Event Importance Period Actual Consensus Prior Scenario
Datetime
2019-01-04 07:30:00 United States Unemployment Rate High DEC 0.039 0.037 0.037 A
2019-02-01 07:30:00 United States Unemployment Rate High JAN 0.040 0.039 0.039 A
2019-03-08 07:30:00 United States Unemployment Rate High FEB 0.038 0.039 0.040 D
2019-04-05 06:30:00 United States Unemployment Rate High MAR 0.038 0.038 0.038 A
2019-05-03 07:30:00 United States Unemployment Rate High APR 0.036 0.038 0.038 C
Country/Region Event Importance Period Actual Consensus Prior Scenario
Datetime
2019-08-02 07:30:00 United States Unemployment Rate High JUL 0.037 0.037 0.037 A
2019-09-06 07:30:00 United States Unemployment Rate High AUG 0.037 0.037 0.037 A
2019-10-04 07:30:00 United States Unemployment Rate High SEP 0.035 0.037 0.037 C
2019-11-01 06:30:00 United States Unemployment Rate High OCT 0.036 0.036 0.035 A
2019-12-06 07:30:00 United States Unemployment Rate High NOV 0.035 0.036 0.036 C

With all the information required we proceed to run our back test in order to understand what's happening with the strategy.

In [53]:
df_decisions
Out[53]:
Scenario Operation SL TP Volume
0 A Sell 353.0 228.0 729960.0
1 B Sell 188.0 256.0 47170.0
2 C Buy 228.0 919.0 1894200.0
3 D Buy 407.0 188.0 1637755.0
In [54]:
### Back test for A scenario. (Sell order)
initial_cap = 100000
fn.get_trading_summary(dt.training_usdmxn, clasification_train[clasification_train['Scenario']=='A'], df_decisions['SL'][0],
                       df_decisions['TP'][0], df_decisions['Volume'][0], initial_cap, 'A')
Out[54]:
Scenario Operation Volume Result Pip Up Pip Down Capital Cumulative Capital
Datetime
2018-07-06 07:30:00 A Sell 729960.0 Won 353.0 228.0 807.656561 98524.713247
2018-11-02 06:30:00 A Sell 729960.0 Won 353.0 228.0 988.659142 99513.372389
In [55]:
### Back test for B scenario. (Sell order)
fn.get_trading_summary(dt.training_usdmxn, clasification_train[clasification_train['Scenario']=='B'], df_decisions['SL'][1],
                       df_decisions['TP'][1], df_decisions['Volume'][1], initial_cap, 'B')
Out[55]:
Scenario Operation Volume Result Pip Up Pip Down Capital Cumulative Capital
Datetime
2018-03-09 07:30:00 B Sell 47170.0 Lost 188.0 256.0 -41.115711 99958.884289
2018-04-06 07:30:00 B Sell 47170.0 Won 188.0 256.0 59.911092 100018.795381
2018-08-03 07:30:00 B Sell 47170.0 Lost 188.0 256.0 -60.345416 99958.449965
2018-09-07 07:30:00 B Sell 47170.0 Lost 188.0 256.0 -41.214504 99917.235461
2018-12-07 07:30:00 B Sell 47170.0 Won 188.0 256.0 97.682927 100014.918388
In [56]:
### Back test for C scenario. (Buy order)
fn.get_trading_summary(dt.training_usdmxn, clasification_train[clasification_train['Scenario']=='C'], df_decisions['TP'][2],
                       df_decisions['SL'][2], df_decisions['Volume'][2], initial_cap, 'C')
Out[56]:
Scenario Operation Volume Result Pip Up Pip Down Capital Cumulative Capital
Datetime
2018-06-01 07:30:00 C Buy 1894200.0 Won 919.0 228.0 7509.654851 107509.654851
In [57]:
### Back test for D scenario. (Buy order)
fn.get_trading_summary(dt.training_usdmxn, clasification_train[clasification_train['Scenario']=='D'], df_decisions['TP'][3],
                       df_decisions['SL'][3], df_decisions['Volume'][3], initial_cap, 'D')
Out[57]:
Scenario Operation Volume Result Pip Up Pip Down Capital Cumulative Capital
Datetime

From the back test result we can see we have a positive balance, meaning we have developed a simple but effective trading strategy.

In [59]:
### For A scenario let's see the limits
limit_low_bull_a = fn.limits(df_escenarios)[0][0]
limit_high_bull_a = fn.limits(df_escenarios)[0][1]

limit_low_bear_a = fn.limits(df_escenarios)[0][2]
limit_high_bear_a = fn.limits(df_escenarios)[0][3]

limit_low_bull_b = fn.limits(df_escenarios)[1][0]
limit_high_bull_b = fn.limits(df_escenarios)[1][1]

limit_low_bear_b = fn.limits(df_escenarios)[1][2]
limit_high_bear_b = fn.limits(df_escenarios)[1][3]

limit_low_bull_c = fn.limits(df_escenarios)[2][0]
limit_high_bull_c = fn.limits(df_escenarios)[2][1]

limit_low_bear_c = fn.limits(df_escenarios)[2][2]
limit_high_bear_c = fn.limits(df_escenarios)[2][3]

limit_low_bull_d = fn.limits(df_escenarios)[3][0]
limit_high_bull_d = fn.limits(df_escenarios)[3][1]

limit_low_bear_d = fn.limits(df_escenarios)[3][2]
limit_high_bear_d = fn.limits(df_escenarios)[3][3]

5. Trading System Definition


As we've been saying across the notebook, our trading system it's based on the reporting value for US unemployment rate, in that sense we have four scenarios

  • A: Here we have a clearly increment on the report rate (in this scenario the report indicator suprasses not only previous values but also it's higher than the economical expectations).

  • B: For this scenario we have a defined situation, the report indicator it has to be greater or equal to the economical expectations, but even on this defined scenario we can see at least two situations. One where the actual value suprasses the previous or the previous been greater than the current data.

  • C: For this scenario we have a defined situation, the report indicator it has to be less than the economical expectations, but even on this defined scenario we can see at least two situations. One where the actual value suprasses the previous or the previous been greater than the current data.

  • D: The last event correspond to a clearly decrease for the unemployment rate, it's a linear decreasing where the economical expectations are below the previous value and the report one it's even less from the expectations.

Having a defined possible scenarios now we need to set a trading strategy with this information. This means the signal to make an specific operation.

  • A: Sell order (because we expect the USDMXN goes down, and we want to sell high and buy low)
  • B: Sell order (because we expect the USDMXN goes down, mainly because the rate it's higher than the economical expectations)
  • C: Buy order (because we expect the USDMXN goes up, and we want to buy cheap)
  • D: Buy order (because we expect the USDMXN goes up, and we want to buy cheap)

We define a single movement for each time the indicator it's report, that operation should be closed when one of the limits defined are touched:

  • Takeprofit: Winning limit defined for each kind of operation
  • Stoploss: Losing limit defined for each kind of operation
  • Volume: Trading volume assign to each kind of operation

Now a natural question it would be, what happen if the prices movements never touches the limits of Takeprofit or Stoploss?. To answer that the algorithm designed closes the operation just right before another reporting for the indicator it's publicated, with this in consideration the operation it's closed with the balance to that timestamp

In order to understand better the trading system definition we'll display the docstring corresponding to the trading system.

In [60]:
### Let's see the docstring for the trading system
help(fn.get_trading_summary)
Help on function get_trading_summary in module functions:

get_trading_summary(data: pandas.core.frame.DataFrame, clasification: pandas.core.frame.DataFrame, pip_up: float, pip_down: float, volume: float, intial_cap: float, scenario=None) -> pandas.core.frame.DataFrame
    Trading system definition based on the unemployment rate from the USA economy.
    The decisions are set from a previous knowledge on the behaviour and relationship
    between the USDMXN price and the reported value for the indicator. It summarize the
    trading operation and the capital evolution through the life of the operations
    
    Parameters
    ----------
    
    data: pd.DataFrame (default:None) --> Required parameter
    
        USDMXN prices on a minute granularity, it has to follow the next structure
    
        'timestamp': First column, correspond to the timestamp associated to each price
        'open': Second column, correspond to the open price for each timestamp associated
        'high': Third column, correspond to the high price for each timestamp associated
        'low': Fourth column, correspond to the low price for each timestamp associated
        'close': Fifth column, correspond to the close price for each timestamp associated
        'volume': Sixth column, correspond to the volume operated for each timestamp associated
        'date': Seventh column, correspond to the date in YYYY-MM-DD format for each timestamp
    
    clasification: pd.DataFrame (default:None) --> Required parameter
    
        USA unemployment rate reporting from Jan-2018 to Dec-2020 (monthly frequency)
    
        'Datetime': DataFrame index, correspond to the timestamp where the indicator was reported
        'Country/Region ': Region of origin (unique value "United States")
        'Event ': Indicator name
        'Importance ': Level of importance associated to the indicator within the USA economy
        'Period ': Reported period
        'Actual ': Reported value for the indicator
        'Consensus ': Economical expectations to the indicator report value
        'Prior ': Previous value corresponding to the indicator (previous month)
        'Scenario': Type of scenario definition
    
    pip_up: float (default:None) --> Required parameter
    
        Number of Pip's that will define an increase on USDMXN prices
    
    pip_down: float (default:None) --> Required parameter
    
        Number of Pip's that will define a downgrade on USDMXN prices
    
    volume: float (default:None) --> Required parameter
    
        Number of USDMXN to be trade by each operation
    
    intial_cap: float (default:None) --> Required parameter
    
        Initial capital for start the trading system (in USD)
    
    scenario: str (default:None) --> Optional parameter
    
        Scenario where the trading system wants to be analyzed. If none it will display all scenarios
    
    Returns
    -------
    
    trading_res: pd.DataFrame
    
        Final summary associated to the trading strategy it can correspond just to a single scenario or
        all of them contained in clasification data frame. It follows the next structure
    
        'Datetime': Index, timestamp where the indicator was reported
        'Scenario': Scenario associated to the trading decision within that timestamp
        'Operation': Signal detection (buy or sell)
        'Volume': Sell or buy volume associated to the trading decision
        'Result': Balance of the operation (won or lost)
        'Pip Up': The upper pip defined for that trading strategy
        'Pip Down': The lower pip defined for that trading strategy
        'Capital': Utility assigned to the operation
        'Cumulative Capital': Evolution of the invested capital within the whole period
    
    References
    ----------
    
    [1] https://pandas.pydata.org/docs/

6. Trading System Optimization


As we said the more "important" parameters that define the trading system performance are delimited by a Takeprofit, Stoploss and Volume. This parameters changes acording to every scenario we can face, if we remember:

  • A: Sell signal
  • B: Sell signal
  • C: Buy signal
  • D: Buy signal

In order to achieve the optimal result we implemented a Particle Swarm Optimization (PSO), an heuristic algorithm used to minimize costs by finding the optimal parameter combination. With this is consideration we need to define a objective function to be minimized.

We're working with trading balances, so in order to get best risk-return relationship we can´t work or minimize the losses for the returns, we need to use a function that assigns that utility to a risk exposure, we'll use Sharpe Ratio:

The Sharpe Ratio is commonly used for comparing return versus the risk. The formula for this MAD is presented below:

$$\text{Sharpe Ratio} = \frac{R-R_{f}}{\sigma}$$

Where:

  • $R$: annual expected return of the asset
  • $R_{f}$:annual risk-free rate
  • $\sigma$ : annualized standard deviation of returns

To generate a significant profit, we must trade high volumes of the asset. 

  • Parameter: 1
  • Name: Volume
  • Description: The volume in units to start a trading transaction
  • Type: Numeric
  • Range: 30,000-50,000 for A and B scenario, else 5,000-10,000
  • Minimum change: 0.0005
  • Parameter: 2
  • Name: Take Profit
  • Description: Pip variation that limits our profits
  • Type: Numeric
  • Range:  It changes according to the scenario therefore the trading decision
  • Minimum change: 0.0005
  • Parameter: 3
  • Name: Stop Loss
  • Description: Pip variation that limits our Losses
  • Type: Numeric
  • Range:  It changes according to the scenario therefore the trading decision
  • Minimum change: 0.0005

The results for the previously described process are presented in the following sells.

In [ ]:
### Let's see the sharpe ratio calculator
help(fn.max_sharpe)

6.1 Optimization for Training Period

In [61]:
%%time
### Let's maximaze the parametrs for A scenario
down_limits_a = [limit_low_bull_a, limit_low_bear_a, 30000]
upper_limits_a = [limit_high_bull_a, limit_high_bear_a, 60000]

rf = 0.02237 # Risk free rate associated to the period to analyze

xopt_a, fopt_a = fn.get_pso(fn.max_sharpe, down_limits_a, upper_limits_a, (dt.training_usdmxn, clasification_train, 100000,
                                                                           'A', rf), 50, 0.0005)

print(f'The optimal values for this configuration are:{xopt_a.tolist()}')
Stopping search: maximum iterations reached --> 50
The optimal values for this configuration are:[1070.5282762647882, 56.43038703386708, 60000.0]
CPU times: total: 4min 50s
Wall time: 4min 54s
In [62]:
%%time
### Let's maximaze the parametrs for B scenario
down_limits_b = [limit_low_bull_b, limit_low_bear_b, 30000]
upper_limits_b = [limit_high_bull_b, limit_high_bear_b, 60000]

xopt_b, fopt_b = fn.get_pso(fn.max_sharpe, down_limits_b, upper_limits_b, (dt.training_usdmxn, clasification_train, 100000,
                                                                           'B', rf), 50, 0.0005)

print(f'The optimal values for this configuration are:{xopt_b.tolist()}')
Stopping search: maximum iterations reached --> 50
The optimal values for this configuration are:[316.7851313104189, 183.91274692772677, 60000.0]
CPU times: total: 5min 32s
Wall time: 5min 34s
In [63]:
%%time
### Let's maximaze the parametrs for C scenario
down_limits_c = [limit_low_bear_c, limit_low_bull_c, 5000]
upper_limits_c = [limit_high_bear_c, limit_high_bull_c, 10000]

xopt_c, fopt_c = fn.get_pso(fn.max_sharpe, down_limits_c, upper_limits_c, (dt.training_usdmxn, clasification_train, 100000,
                                                                           'C', rf), 50, 0.0005)

print(f'The optimal values for this configuration are:{xopt_c.tolist()}')
Stopping search: maximum iterations reached --> 50
The optimal values for this configuration are:[245.32141590528428, 1057.7223455274705, 7129.314231332932]
CPU times: total: 1min 56s
Wall time: 1min 57s
In [64]:
%%time
### Let's maximaze the parametrs for D scenario
down_limits_d = [limit_low_bear_d, limit_low_bull_d, 5000]
upper_limits_d = [limit_high_bear_d, limit_high_bull_d, 10000]

xopt_d, fopt_d = fn.get_pso(fn.max_sharpe, down_limits_d, upper_limits_d, (dt.training_usdmxn, clasification_train, 100000,
                                                                           'D', rf), 50, 0.0005)

print(f'The optimal values for this configuration are:{xopt_d.tolist()}')
Stopping search: maximum iterations reached --> 50
The optimal values for this configuration are:[448.7224423344216, 712.4391927582963, 10000.0]
CPU times: total: 3min 39s
Wall time: 3min 40s

6.2 Validation for Test Period

In [70]:
### Let's validate the trading strategy for the optimal A scenario
a_test = fn.get_trading_summary(dt.test_usdmxn, clasification_test, xopt_a[0], xopt_a[1], xopt_a[2], 100000, 'A')
a_test
Out[70]:
Scenario Operation Volume Result Pip Up Pip Down Capital Cumulative Capital
Datetime
2019-01-04 07:30:00 A Sell 60000.0 Won 1070.528276 56.430387 26.275454 100026.275454
2019-02-01 07:30:00 A Sell 60000.0 Won 1070.528276 56.430387 63.640221 100089.915675
2019-04-05 06:30:00 A Sell 60000.0 Won 1070.528276 56.430387 25.252525 100115.168200
2019-06-07 07:30:00 A Sell 60000.0 Won 1070.528276 56.430387 25.856496 100141.024697
2019-07-05 07:30:00 A Sell 60000.0 Won 1070.528276 56.430387 24.691358 100165.716055
2019-08-02 07:30:00 A Sell 60000.0 Won 1070.528276 56.430387 49.885679 100215.601733
2019-09-06 07:30:00 A Sell 60000.0 Won 1070.528276 56.430387 25.316456 100240.918189
2019-11-01 06:30:00 A Sell 60000.0 Won 1070.528276 56.430387 24.479804 100265.397993
In [71]:
### Let's validate the trading strategy for the optimal B scenario
b_test = fn.get_trading_summary(dt.test_usdmxn, clasification_test, xopt_b[0], xopt_b[1], xopt_b[2], 100000, 'B')
b_test
Out[71]:
Scenario Operation Volume Result Pip Up Pip Down Capital Cumulative Capital
Datetime
In [72]:
### Let's validate the trading strategy for the optimal C scenario
c_test = fn.get_trading_summary(dt.test_usdmxn, clasification_test, xopt_c[0], xopt_c[1], xopt_c[2], 100000, 'C')
c_test
Out[72]:
Scenario Operation Volume Result Pip Up Pip Down Capital Cumulative Capital
Datetime
2019-05-03 07:30:00 C Buy 7129.314231 Lost 245.321416 1057.722346 -37.201598 99962.798402
2019-10-04 07:30:00 C Buy 7129.314231 Won 245.321416 1057.722346 8.963932 99971.762334
2019-12-06 07:30:00 C Buy 7129.314231 Lost 245.321416 1057.722346 -37.713565 99934.048770
In [73]:
### Let's validate the trading strategy for the optimal D scenario
d_test = fn.get_trading_summary(dt.test_usdmxn, clasification_test, xopt_d[0], xopt_d[1], xopt_d[2], 100000, 'D')
d_test
Out[73]:
Scenario Operation Volume Result Pip Up Pip Down Capital Cumulative Capital
Datetime
2019-03-08 07:30:00 D Buy 10000.0 Lost 448.722442 712.439193 -34.572169 99965.427831
In [74]:
### Let's see the final balance for the whole trading strategy
final_balance = pd.concat([a_test['Capital'], b_test['Capital'], 
                           c_test['Capital'], d_test['Capital']]).sum()

print(f'The final balance for the trading strategy is of : {round(final_balance, 2)} USD')
The final balance for the trading strategy is of : 164.87 USD

Now, using the optimal paramaters got by the PSO optimization, we got a final balance for the trading strategy corresponding to $\$164.87$ USD, as we can see the scenario where we get to lose more money are defined by the Buy strategies (C and D) scenarios. This means that in general the expected behaviour for our currency in this cases were not accurate, this doesn´t mean that our trading system it's completely wrong there are a lot of factor that influence this result, starting for the period, we can´t guarantee a behaviour for the test set to be replicate as the train set, furthermore there is the parameters definition, maybe the spectre analyzed wasn´t the best for Takeprofit, Stoploss and Volume.

As a recommendation we would explore other ranges for this parameters to be test in order to get a better behaviour modelling for the USDMXN asset.


7. Final Results and Conclusions


The knowledge acquired in the Microstructures and Trading Systems class was reflected throughout this final project. During all the stages of the project, research skills, analysis, critical thinking, and both empirical and scientific knowledge were worked on.

Starting from the selection of the indicator, the fact that the unemployment rate is taken means a significant sample of the strength of an economy, since this indicator is strongly linked to national productivity, so the changes that occur in it will have a significative impact on the explored currency, also considering that the economies that are involved in the currency pair are strongly connected.

Starting from the first section of the project, the exploration of the databases helps to contextualize the relationship that exists between the indicator and the currency. Empirical validations are used to identify trends that helped define expectations about the next stages of the project.

In the statistical part, the behavior of the time series of the indicator is thoroughly known, this is extremely important because it is from the mastery of this information that the trading system begins to take shape. Applying different statistical tests, we could find the time series statistical properties, among the most important we found out that: the sample don’t follow a normal distribution, variance is not constant through time, is not stationary, seasonal and autocorrelation component it’s present finally atypical values were found, which happened during 2020.

Carrying out the classification of scenarios for the financial aspects section, it is possible to recognize patterns based on logical expectations and the historical information of the indicator and obtain metrics that gave rise to the backtest and optimization phase.

At the time of defining the trading system, we were able to identify and define a search space for the parameters on which the optimization would be performed, parameters involve in the optimization are the volume, stoploss and takeprofit . It is important to emphasize that this system could be optimized for the utility function, but it becomes more sophisticated using a performance attribution measure, we implemented Sharpe’s Ratio and Sortino’s Ratio.

Developing a trading strategy found in fundamental analysis with data granularity that goes up to minutes gave us a great perspective of the profit opportunities that can be achieve in the market and appreciate market dynamics driven by a macroeconomic indicator.

The algorithm used for optimization was PSO as it maximized the profitability of the trading system efficiently as we ended up with a positive balance in the end.

It should be noted that the fact that the trading system generates profits shows that the application of theoretical, financial and computational concepts were sufficient to meet the objective of the project, to create a trading system based on the impact that the unemployment rate has on USDMXN. Although t’s important to consider that our trading system is based only in one economic indicator, therefore the assumption that with one indicator we can be able to predict with high accuracy the direction of the currency based on past data it’s unrealistic.


8. References


[1] Munnoz, 2020. Python project template. https://github.com/iffranciscome/python-project. (2021).